Information Preservation in ARIADNE
نویسندگان
چکیده
Preserving digital information is a necessary commitment to the future. In ARIADNE, a project of the Digital Publishing Group at ICAT, we are developing an integrated information system for news processing and publishing. In this paper, we present our perspectives on various information preservation issues addressed by this research. These include the semantic preservation of information classification schemes, preservation of the layout of dynamically generated documents, preservation of the linkage to external collections, and the economic sustainability of the news archive. INTRODUCTION Preserving or archiving digital information is, today, an important and necessary commitment to the future. How will we make available today’s news to future readers? Giving the importance of this topic, the Digital Publishing Group of ICAT is studying new methods for preserving and processing heterogeneous information in organizations, combining multiple databases under a common framework. In project ARIADNE, jointly developed with Público, a national daily newspaper, we are building a new digital publishing structure, where all the information used and produced by journalists is organized in a common database (containing both data and metadata for collections maintained outside the organization). From the information in this digital library, we generate publications in digital format. Público already maintains an archive of all the editions of its paper publications. This is a profitable unit within the company. The archive is used by the newspaper journalists and provides services to external entities. However, as we move into on-line publishing (we are about to release several new publications which will be available exclusively on-line) there is a need to define the processes for archiving and retrieving previously published on-line information. These new on-line publications differ significantly from the previous generation, where we were doing little more than creating on-line replicas of the paper editions. Our new publications are beginning to behave more and more as interactive user interfaces to databases of multimedia presentations. In the next section we present the global architecture of our system, and then proceed with a more detailed discussion of the preservation issues in our digital library. ARCHITECTURE The architecture of ARIADNE is based on large multimedia data repository, which holds various collections of documents, newspaper articles, databases of readers and authors, places and events. For some collections, namely external publications, we only keep metadata and links for the articles. The global architecture and the main information flow are shown in Figure 1. Several sources, including news agencies feeds, articles created for the paper edition of Público and external publications provide news items to the ARIADNE repository. Each article received is submitted to a preprocessing stage, where its metadata is extracted. The articles and news feeds are then converted into a common format based on XML[9], and archived in the collection repository with the software module Loader. Editions of electronic publications are built on a second stage, using another module, Generator, which selects a group of articles archived in the collections repository and packs them into presentations (or editions). This process finishes by converting the XML sources to HTML, making articles viewable from the current generation of web browsers. With this strategic approach we intend to overcome possible changes in data format standards and sustain our archival mission[3]. Collections Repository Editions (HTML) Publication Templates External Search Engines Internal Search Engines Público Paper Version News Agencies Feeds Loader Generator Article (XML) External Publications
منابع مشابه
ARIADNE and HOPLa: Flexible Coordination of Collaborative Processes
The research into the Ariadne system and its coordination language HOPLa aims to provide generic support for hybrid collaborative processes. These are complex information processing tasks involving coordinated contributions from multiple people and tools. Ariadne should applicable for a broad spectrum of these processes and actively support people in working in these processes and in defining a...
متن کاملThe Ariadne Approach to Web-Based Information Integration
The Web is based on a browsing paradigm that makes it diÆcult to retrieve and integrate data from multiple sites. Today, the only way to do this is to build specialized applications, which are time-consuming to develop and diÆcult to maintain. We have addressed this problem by creating the technology and tools for rapidly constructing information agents that extract, query, and integrate data f...
متن کاملIntegrating Archaeological Datasets: the ARIADNE Portal
One of the emerging needs of the archaeological community is represented by the importance of availing of systems that allow to tackle new research questions, by querying diverse available resources. Usually, archaeological digital data is stored in non-standardised individual databases with a limited possibility of integration and a high level of fragmentation. The EU-funded project ARIADNE, h...
متن کاملAriadne: Architecture of a Portable Threads system supporting Mobile Processes
TIrreads possess a simply expressed and powerful fonn of concurrency, easily exploitable in applications that run on both uniand multi-processors, sharcdand distributed-memory systems. This paper presents the design and implementation of Ariadne: a layered, C-based software architecture for multi-threaded computing on a variety of platfonns. Ariadne is a ponable system lhat exploils sharedand d...
متن کاملContextualization of Topics - Browsing through Terms, Authors, Journals and Cluster Allocations
This paper builds on an innovative Information Retrieval tool, Ariadne. The tool has been developed as an interactive network visualization and browsing tool for large-scale bibliographic databases. It basically allows to gain insights into a topic by contextualizing a search query (Koopman et al., 2015). In this paper, we apply the Ariadne tool to a far smaller dataset of 111,616 documents in ...
متن کامل